An Online Learning Approach to Buying and Selling Demand Response

نویسندگان

  • Kia Khezeli
  • Eilyan Bitar
چکیده

We adopt the perspective of an aggregator, which seeks to coordinate its purchase of demand reductions from a fixed group of residential electricity customers, with its sale of the aggregate demand reduction in a two-settlement wholesale energy market. The aggregator procures reductions in demand by offering its customers a uniform price for reductions in consumption relative to their predetermined baselines. Prior to its realization of the aggregate demand reduction, the aggregator must also determine how much energy to sell into the two-settlement energy market. In the day-ahead market, the aggregator commits to a forward contract, which calls for the delivery of energy in the real-time market. The underlying aggregate demand curve, which relates the aggregate demand reduction to the aggregator’s offered price, is assumed to be affine and subject to unobservable, random shocks. Assuming that both the parameters of the demand curve and the distribution of the random shocks are initially unknown to the aggregator, we investigate the extent to which the aggregator might dynamically adapt its offered prices and forward contracts to maximize its expected profit over a time window of T days. Specifically, we design a dynamic pricing and contract offering pol✩This work was supported in part by NSF grant ECCS-1351621, NSF grant IIP1632124, US DoE under the CERTS initiative, and the Simons Institute for the Theory of Computing. This work builds on our preliminary results, presented at the IFAC 2017 World Congress (Khezeli et al., 2017). The current manuscript differs significantly from the conference version in terms of new results, formal proofs, and more detailed technical discussions. We thank Weixuan Lin for many helpful discussions, and his assistance in the proof of Lemma 2. Email addresses: [email protected] (Kia Khezeli), [email protected] (Eilyan Bitar) icy that resolves the aggregator’s need to learn the unknown demand model with its desire to maximize its cumulative expected profit over time. The proposed pricing policy is proven to be asymptotically optimal — exhibiting a regret over T days that is no greater than O( √ T ).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Electronic Marketplace Based on Reputation and Learning

In this paper, we propose a market model which is based on reputation and reinforcement learning algorithms for buying and selling agents. Three important factors: quality, price and delivery-time are considered in the model. We take into account the fact that buying agents can have different priorities on quality, price and delivery-time of their goods and selling agents adjust their bids acco...

متن کامل

A Reputation and Learning Model for Electronic Commerce Agents

In this paper, reinforcement learning is used in order to model the reputation of buying and selling agents. Two important factors, quality and price, are considered in the proposed model. Each selling agent learns to evaluate the reputation of buying agents, based on their pro ts for that seller and uses this reputation to dedicate a discount for reputable buying agents. Also, selling agents l...

متن کامل

Optimal Operation of Integrated Energy Systems Considering Demand Response Program

This study presents an optimal framework for the operation of integrated energy systems using demand response programs. The main goal of integrated energy systems is to optimally supply various demands using different energy carriers such as electricity, heating, and cooling. Considering the power market price, this work investigates the effects of multiple energy storage devices and demand res...

متن کامل

MULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing

This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...

متن کامل

Considering chain to chain competition in forward and reverse logistics of a dynamic and integrated supply chain network design problem

In this paper, a bi-objective model is presented for dynamic and integrated network design of a new entrant competitive closed-loop supply chain. To consider dynamism and integration in the network design problem, multiple long-term periods are regarded during planning horizon, so that each long-term period includes several short-term periods. Furthermore, a chain to chain competition between t...

متن کامل

A Learning Algorithm for Agents in Electronic Marketplaces

In this paper, we propose a reputation oriented reinforcement learning algorithm for buying and selling agents in electronic market environments. We take into account the fact that multiple selling agents may offer the same good with different qualities. In our approach, buying agents learn to avoid the risk of purchasing low quality goods and to maximize their expected value of goods by dynami...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.07342  شماره 

صفحات  -

تاریخ انتشار 2017